DOES WASTE-RECYCLING REALLY IMPROVE THE 
MULTI-PROPOSAL METROPOLIS-HASTINGS MONTE CARLO 
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Abstract. The waste-recycling Monte Carlo (WR) algorithm introduced by physicists is 
a modification of the (multi-proposal) Metropolis-Hastings algorithm, which makes use of 
all the proposals in the empirical mean, whereas the standard (multi-proposal) Metropolis- 
Hastings algorithm only uses the accepted proposals. In this paper, we extend the WR 
algorithm into a general control variate technique and exhibit the optimal choice of the 
control variate in terms of asymptotic variance. We also give an example which shows 
that in contradiction to the intuition of physicists, the WR algorithm can have an asymp- 
totic variance larger than the one of the Metropolis-Hastings algorithm. However, in the 
particular case of the Metropolis-Hastings algorithm called Boltzmann algorithm, we prove 
that the WR algorithm is asymptotically better than the Metropolis-Hastings algorithm. 
This last property is also true for the multi-proposal Metropolis-Hastings algorithm. In this 
last framework, we consider a linear parametric generalization of WR, and we propose an 
estimator of the explicit optimal parameter using the proposals. 



1. Introduction 

The Metropolis-Hastings algorithm is used to compute the expectation (vr, /) of a function 
/ under a probability measure vr difficult to simulate. It relies on the construction by an 
appropriate acceptation/rejection procedure of a Markov chain {Xk,k > 0) with transition 
kernel P such that vr is reversible with respect to P and the quantity of interest (vr, /) is 
estimated by the empirical mean In{f) = ^Ylk=i fi-^k)- We shall recall the well-known 
properties of this estimation (consistency, asymptotic normality) in what follows. In partic- 
ular the quality or precision of the algorithm is measured through the asymptotic variance 
of the estimator of (vr, /). 

The waste-recycling Monte Carlo (WR) algorithm, introduced by physicists, is a modi- 
fication of the Metropolis-Hastings algorithm, which makes use of all the proposals in the 
empirical mean, whereas the standard Metropolis-Hastings algorithm only uses the accepted 
proposals. To our knowledge, the WR algorithm was first introduced in 1977 by Ceperley, 
Chester and Kalos in equation (35) p. 3085 [4]. Without any proof, they claim that "The 
advantage of using this form is that some information about unlikely moves appears in the 
final answer, and the variance is lowered". It is commonly assumed among the physicists 
and supported by most of the simulations that the WR algorithm is more efficient than the 
Metropolis-Hastings algorithm, that is the estimation given by the WR algorithm is consistent 
and has a smaller asymptotic variance. An other way to speed up the Metropolis-Hastings 
algorithm could be to use multiple proposals at each step instead of only one. According to 
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Prenkel [6], the waste recycling can be particularly useful for these algorithms where many 
states are rejected. 

Our aim is to clarify the presentation of the WR algorithms with one proposal and with 
multiple proposals and to present a first rigorous study of those algorithms. We will give in 
Section [2] an introduction to our results in the finite state space case. Our main new results 
are stated in Theorem 13. 4|, which is a first step towards the comparison of the asymptotic 
variances. We shall detail their consequences in the didactic Section [2] for: 

- the WR algorithm through Propositions 12. ll (consistency of the estimation), 12.21 (as- 
ymptotic normality) and 12.31 (a first partial answer to the initial question: Does 
waste-recycling really improve the Metropolis-Hastings Monte Carlo algorithm?), 

- the multi-proposal WR algorithm through Propositions 12.71 (consistency of the es- 
timation and asymptotic normality) and 12.81 (a second partial answer to the inital 
question: Does waste-recycling really improve the Metropolis-Hastings Monte Carlo 
algorithm?). 

The study of the WR estimator in the form In{f) + Jnif), for a given functional J, leads 
us to rewrite the WR algorithm as a particular case of a general control variate problem by 
considering the estimators Inif) + Jnii^) where the function ^lJ is possibly different from /. 
In the multi-proposal framework, the consistency (or convergence) of this general algorithm 
and its asymptotic normality are stated in Theorem 13.41 in Section [3l We also give its 
asymptotic variance and prove that the optimal choice of tp in terms of asymptotic variance 
is the solution, F, of the Poisson equation ([6]). This choice achieves variance reduction, but 
the function F is difficult to compute. It is possible to replace it by an approximation. In 
some sense, / is such an approximation and for this particular choice we recover the Waste 
Recycling estimator introduced by physicists. In Section [5] which is dedicated to the single 
proposal case, we give a simple counter-example (see paragraph 15. 2p which shows that the 
WR algorithm does not in general improve the Metropolis-Hastings algorithm : the WR 
algorithm can have an asymptotic variance larger than the one of the Metropolis-Hastings 
algorithm. Since, Athenes [3] has also observed variance augmentation in some numerical 
computations of free energy. However, in the particular case of the Metropolis-Hastings 
algorithm called Boltzmann algorithm, we prove in Section |4] that the (multi-proposal) WR 
algorithm is asymptotically better than the (multi-proposal) Metropolis-Hastings algorithm. 
In this particular framework, we explicit the optimal value bi, of b for the parametric control 
variate Jn{bf)- This optimal value can be estimated using the Makov chain (Xfc,0 < k < n). 

Acknowledgments. We warmly thank Manuel Athenes (CEA Saclay) for presenting the 
waste recycling Monte Carlo algorithm to us and Randal Douc (CMAP Ecole Polytechnique) 
for numerous fruitful discussions. We also thank the referees for their valuable comments. 

2. Didactic version of the results 

For simplicity, we assume in the present section that E \s & finite set. Let (z^, h) = 
^^.g£; z^(x)/i(x) denote the "integration" of a real function defined on E, h = {h{x),x E E), 
w.r.t. to a measure on E, u = {u{x),x G E). 

Let vr be a probability measure on E such that 7r(x) > for all a; G and / a real function 
defined on E. The Metropolis-Hastings algorithm gives an estimation of (vr, /) as the a.s. 
limit of the empirical mean of /, - X^^^i f{Xk), as n goes to infinity, where X = {Xn,n > 0) 
is a Markov chain which is reversible with respect to the probability measure vr. 



WRMC 



3 



2.1. The Metropolis-Hastings algorithm. The Markov chain X = {Xn,n G N) of the 

Metropolis-Hastings algorithm is built in the following way. Let Q be an irreducible transition 
matrix over E such that for all x,y £ E, if Q{x,y) = then Q{y,x) = 0. The transition 
matrix Q is called the selection matrix. 

For x,y £ E such that Q{x,y) > 0, let {p{x,y), p{y, x)) £ (0,1]^ be such that 

(1) p{x, y)'K{x)Q{x, y) = p{y, x)Tr{y)Q{y, x). 

The function p is viewed as an acceptance probability. For example, one gets such a function 
p by setting 

(2) p(a;,y)=^f44?7^V for ah x,y£E s.t. Q(x,y)>0, 



where 7 is a function with values in (0,1] such that j{u) = u'y{l/u). Usually, one takes 
7(it) = min(l,u) for the Metropolis algorithm. The case 7(u) = u/{l + u) is known as the 
Boltzmann algorithm or Barker algorithm. 

Let Xq be a random variable taking values in E with probability distribution uq. At step n, 
Xq, . . . , Xn are given. The proposal at step n + 1, Xn+i, is distributed according to Q{Xn, •)• 
This proposal is accepted with probability p{Xn, Xn+i) and then Xn+i = Xn+i- If it is 
rejected, then we set X^+i = Xn- 

It is easy to check that X = (Xn,n > 0) is a Markov chain with transition matrix P 
defined by 



(3) yx,y£E, P{x,y) 



Q{x,y)p{x,y) if x / y, 

Furthermore X is reversible w.r.t. to the probability measure vr: ■7t{x)P{x, y) = 7r(y)P(y, x) 
for all x,y £ E. This property is also called detailed balance. By summation over y £ E, 
one deduces that vr is an invariant probability for P (i.e. ttP = vr). The irreducibility of Q 
implies that P is irreducible. Since the probability measure tt is invariant for P, we deduce 
that X is positive recurrent with (unique) invariant probability measure vr. In particular, 
for any real valued function / defined on E, the ergodic theorem (see e.g. [8]) implies the 
consistency of the estimation: 

lim /„,(/) = (vr,/) a.s., 
n—foo 

where 

(4) /„(/) = i^/(Xfc). 

k=l 

The asymptotic normality of the estimator In{f) is given by the following central limit the- 
orem (see [5] or [8]) 

V^(I„(/)-(7r,/)) ^ AA(0,a(/)2). 

Here AA(0, cr^) denotes the Gaussian distribution with mean and variance o"^, the conver- 
gence holds in the distribution sense and 

(5) a(/)2 = (vr,F2)-(vr,(PF)2). 

where F denotes the unique solution up to an additive constant of the Poisson equation 

(6) F(x)-PF(x) =/(x)-(7r,/), x£E 
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and Ph{x) = ^yizE P{^iy)^{y)- Improving the Metropolis-Hastings algorithm means ex- 
hibiting other estimators of (vr, /) that are still consistent (i.e. estimators which converge a.s. 
to (vr, /)) but with an asymptotic variance smaller than (t(/)^. 

2.2. WR algorithm. The classical estimation of (vr, /) by the empirical mean In{f) makes 
no use of the proposals which have been rejected. For a long time, physicists have claimed 
that the efficiency of the estimation can be improved by including these rejected states in the 
sampling procedure. They suggest to use the so-called Waste-Recycling Monte Carlo (WR) 
algorithm, which consists in replacing f{Xk) in In{f) by a weighted average of /(-Yfc-i) ^-nd 
f{Xk). For the natural choice of weights corresponding to the conditional expectation of 
f{Xi?j w.r.t. (Xfc_i,Xfc), one gets the following estimator of (vr,/): 

^ n— 1 

fc=0 
^ n— 1 

= - V p{Xk,Xk+i)f{Xk+i) + (1 - p{Xk,Xk+i))f{Xk). 
n ^-^ 

k=0 

We shall study in Section 16.21 another choice for the weights also considered by Frenkel [7] . 
Notice that the WR algorithm requires the evaluation of / for all the proposals whereas the 
Metropolis-Hastings algorithm evaluates / only for the accepted proposals. Other algorithms 
using all the proposals, such as the Rao-Blackwell Metropolis-Hasting algorithm, have been 
studied, see for example section 6.4.2 in [llj and references therein. In the Rao-Blackwell 
Metropolis-Hasting algorithm, the weight of f{Xk+i) depends on all the proposals Xi, . . . , Xn- 
It is thus necessary to keep in memory the values of all proposals in order to compute the 
estimation of (vr, /). 

One easily checks that I^^{f) — Inif) = Jnif) where for any real function ip defined on 

E, 

1 "'^ f 

k=0 ^ 

1 ""^ / 

= - V p{Xk,Xk+MXk+i) + (1 - p{Xk,Xk+i)mXk) - i'iXk+i) 

k=o ^ 

Notice that Jn(V') = when tp is constant. We can consider a more general estimator of 
(tt, /) given by 

Inif,ij)=Inif) + JnW. 

Notice that I^^{f) = In{f,f) and In{f) = ln{f,0). It is easy to check that the bias of the 
estimator In{f,tp) does not depend on ip: E[/„(/, ■i/')] = E[/n(/)]. Theorem 13.41 implies the 
following result on the estimator In{f,ip). 

Proposition 2.1. For any real functions ip and f defined on E, the estimator In{f,ip) of 
(vr, /) is consistent: a.s. lim In{f,'p) = (vr, /). 

n— >oo 

From this result, Jn{ip) can be seen as a control variate and it is natural to look for -0 
which minimizes the variance or the asymptotic variance of In{f^ V')- Another class of control 
variates has been studied in [2j in the particular case of the Independent Metropolis-Hastings 
algorithm where <5(x, .) does not depend on x. 



%p{Xk+i)\Xk,Xk+i -'p{Xk+i) 
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The last part of Theorem 13.41 implies the following result, where we used Lemma l5.ll to 
derive the asymptotic variance expression. We shall write when Xq is distributed under 
its invariant measure vr (in particular (vr, /) = E7r[/(Xo)]). 

Proposition 2.2. For any real functions and f defined on E, the estimator In{f,ip) of 
(vr, /) is asymptotically normal: 

id) 



V^{In{f,^)-{7T,f)) 

with asymptotic variance a{f,'ip)'^ given by 



AA(0,a(/,V^)^), 



a{f,^l^)=a{f)' 



1-p(Xo,Xi))(f(Xi)-F(Xo; 



+ E^ 



1 - piXo,Xi))UiXi) - F(Xi) - V(Xo) + F(Xo) 



where F solves the Poisson equation In particular, for fixed f , the asymptotic variance 
a{f,tp)'^ is minimal for Tp = F and this choice achieves variance reduction : a{f,F)'^ < o'{f)'^. 

Although optimal in terms of the asymptotic variance, the estimator In{f,F) is not for 
use in practice, since computing a solution of the Poisson equation is more complicated 
than computing (vr,/). Nevertheless, the Proposition suggests that using /„(/, ■0) where ip 
is an approximation of F might lead to a smaller asymptotic variance than in the standard 
Metropolis-Hastings algorithm. Some hint at the computation of an approximation of F by a 
Monte Carlo approach is for instance given in [9] p. 418-419. Because of the series expansion 
F = X^fc>o-P*'(/ ~ {'^^f))i f can be seen as an approximation of F of order 0. Hence the 
asymptotic variance of I^^{f) = In{f,f) might be smaller than the one of In{f) in some 
situations. It is common belief in the physicist community, see [4] or [Tj, that the inequality 
is always true. Notice that, as remarked by Frenkel in a particular case [7], the variance of 
each term of the sum in I^^{f) is equal or smaller than the variance of each term of the 
sum in In{f) by Jensen inequality. But one has also to compare the covariance terms, which 
is not so obvious. We investigate whether the asymptotic variance of the WR algorithm is 
smaller than the one of the standard Metropolis algorithm and reach the following conclusion 
which contradicts the intuition. 



Proposition 2.3. 

i) In the Metropolis case, that is when 
happen that a{f,f)'^ > (T{fY . 



holds with j{u) = min(l,ti), then it may 



au 



l + u 



, for some a G (0,2), then we have a{f,f)'^ < 



(7) 



ii) When ([2]) holds with 7(n) 

cr{f)'^. Furthermore, for f non constant, the function b i— > cr{f,bf)'^ is minimal at 

(vr,/^)-(vr,/)^ 
* i^J^-fPf) 

and bi, > 1/a. When a = 1, if, moreover, cr{f,f Y > 0, then 5* > 1. 



Remark 2.4. Assume that / is not constant. The optimal parameter 6^ defined by ([7]) can 
be estimated by 

In{f^)-In{f? 



h 



" UP) - k ELi f{Xu-i)f{Xu) 
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Notice that a.s. lim^^oo &n = thanks to the ergodic theorem. Using Slutsky theorem, 
one can deduce from Proposition 12.21 that /«(/) + bnJnif) = Inif,bnf) is an asymptotically 
normal estimator of (vr, /) with asymptotic variance a{f,bi,f)'^. Thus, in the framework ii) 
of Proposition 12.31 using the control variate bnJnif) improves strictly the WR estimator as 
soon as either a < 1 or a = 1 (Boltzmann algorithm) and a{f, /)^ is positive. Notice that 
when its asymptotic variance a{f,f)'^ is zero, then the WR estimator I^^{f) = In{f,f) is 
equal to (vr, /). 

To prove assertion i), we give an explicit counter-example such that cr{f,f)'^ > cr{f)'^ in 
the Metropolis case (see Section [5^ and equation (f32l) ). The assertion ii) is also proved in 
Section [5] (see Proposition 15. 3p . Let us make some comments on its hypothesis which holds 
with a = 1 for Boltzmann acceptation rule. 

• By ([T]) and since p{x, y) is an acceptance probability, the constant a has to be smaller 

,11, ■ T^{y)Q{y,x) 

than 1 + mm -. 

XTty,Q{x,y)>0 Tr{x)Q{x,y) 

• If there exists a constant c > s.t. for all distinct x,y € E s.t. Q{x,y) > 0, the 

TTiXlCyiX v) 

quantity ' — - is equal to c or 1/c and ^ holds with 7 such that 7(l/c) = 

^{c)/c then the hypothesis holds with a = 7(c) + 7(1/0). For example assume that 

the transition matrix Q is symmetric and that vr is written as a Gibbs distribution: 

for all X £ E, it{x) = e~^(^^ j ^^liyi^E e~^*-^^ for some energy function H . If the energy 

increases or decreases by the same amount e for all the authorized transitions, then 

7r(a;)Q(x,y) . 1 / -^u e 

IS equal to c or 1/c with c = e . 

7r(y)(9(?/,x) 

According to [10], since for all u > 0, < min(l, n), in the absence of waste recycling, 

1 + ti 

the asymptotic variance o"(/)^ is smaller in the Metropolis case than in the Boltzmann case for 
given TT, Q and /. So waste recycling always achieves variance reduction only for the worst 
choice of 7. Notice however that the Boltzmann algorithm is used in the multi-proposal 
framework where we generalize our results. 

Remark 2.5. When the computation of Pg is feasible for any function g : E (typically 
when, for every x £ E, the cardinal of {y € E : Q(x,y) > 0} is small), then it is possible to 
use Initp — Ptp) as a control variate and approximate (vr, /) hj In{f — {ip — P^j))- Since vr is 
invariant with respect to P, {ir^ip — Pip) = and a.s. /n(/ — {4' ~ Pi')) converges to (vr, /) 
as n tends to infinity. Moreover, the asymptotic variance of the estimator is a{f — + Pip)'^- 
Last, remarking that 



1 1 

(8) In{lP - P^) = - V mXk) - P^{Xk^l)) + - (PV'(^O) - P^{Xn)) 

n ^-^ n 

k=l 

one obtains that the bias difference E[/„(/ + P^)] - E[I„(/)] = [Ptp{Xo) - P?/'(X„)] 
is smaller than 2maXx<^E \'^{x)\/n. 

For the choice ip = F, this control variate is perfect, since according to ([6]), for each n € N*, 
/„(/ — (F — PF)) is constant and equal to (vr, /). 
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For the choice ip = f , the asymptotic variance of the estimator In{Pf) is also smaller than 
the one of Inif)- Indeed setting fo = f — (tt, /), we have 

^{ff - <^{Pff = + {P^Ff - 2{PFf) 

= {tt, (/o + PFf - 2{PFf + (P/o - PFf) 

= {n, fl + 2/oP(F - PF) + {Pfof) = {tt, (/q + Pfo?) 

where we used that PF solves the Poisson equation ([6]) with / replaced by Pf and ([5|) for 
the first equality, ^ for the second and last equalities and the reversibility of vr w.r.t. P for 
the last one. 

Notice the control variate Jn(^) is similar to In{ip — P'lp) except that the conditional 
expectation Pip{Xk-i) of ipiXk) given X^-i in the first term of the r.h.s. of ([8]) is replaced by 
the conditional expectation of V'(^fc) given which can always be easily computed. 

Prom this perspective, the minimality of the asymptotic variance of Inif, V') foi' ip = F is not 
a surprise. 

The comparison between aif^il))"^ and a{f — ip + Pip)'^ can be deduced from Section [6.11 
which is stated in the more general multi-proposal framework introduced in the next para- 
graph. Notice that the sign of a{f, i/j)'^ — a{f — i/j + Pip)'^ depends on ip. 

2.3. Multi-proposal WR algorithm. In the classical Metropolis Hasting algorithm, there 
is only one proposal Xn+i at step n + 1. Around 1990, some extensions where only one 
state among multiple proposals is accepted have been proposed in order to speed up the 
exploration of E (see jpLj for a unifying presentation of MCMC algorithms including the 
multi-proposal Metropolis Hasting algorithm). According to Prenkel [6], the waste recycling 
can be particularly useful for these algorithms where many states are rejected. 

To formalize these algorithms, we introduce a proposition kernel Q : E x V{E) — > [0, 1], 
where V{E) denotes the set of parts of E, which describes how to randomly choose the set 
of proposals: 

(9) Vx G E, Q{x, A) = Oiix^A and J] Q{x, A) = 1. 

AaViE) 

The second condition says that Q(x, •) is a probability on V{E). The first one ensures that 
the starting point is among the proposals. This last convention will allow us to transform 
the rejection/acceptation procedure into a selection procedure among the proposals. 

The selection procedure is described by a probability k. Por {x^A) £ E x V{E), let 
k{x, a, x) £ [0, 1] denote the probability of choosing x G ^ as the next state when the 
proposal set A has been chosen. We assume that X^^g^ '^(^' ^) ~ (th^-t is nix,A,-) is a 
probability measure) and that the following condition holds : 

(10) G P{E), Vx, X G ^, 7r(x)Q(x, A)k{x, A, x) = 7r(x)Q(x, A)k{x, A, x). 

This condition is the analogue of ([T]) for a multi-proposal setting. Por examples of non-trivial 
selection probability k, see after Proposition 12.71 

The Markov chain X = (Xn ,n > 0) is now defined inductively in the following way. Let 
Xq be a random variable taking values in E with probability distribution i'q. At step n, 
Xq, . . . , Xn are given. The proposal set at step n + 1, An+i, is distributed according to 
Q{Xn,-)- Then X^+i is chosen distributed according to k(X„, .). It is easy to check 

that X is a Markov chain with transition matrix 

(11) P{x,y)= Yl Q{x,A)K{x,A,y). 

A&V{E):x,y&A 
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Condition ([TO]) ensures that X is reversible w.r.t. the probabihty measure vr : 7r(x)P(x, y) = 
■K{y)P{y,x). 

Remark 2.6. The multi-proposal Metropolis-Hastings algorithm generalizes the Metropolis- 
Hastings algorithm which can be recovered for the particular choice Q(x,{x,y}) = Q{x,y) 
and for y x, k{x, {x, y},y) = 1 - k{x, {x, y},x) = p{x, y). 

We keep the definition ^ of In{f) but adapt the ones of Jn{ip) and In{f,ip) as follows : 

1 ""^ / 

" k=0 ^ 

71-1 . s 

(12) =n^( ^ K(Xfc,ylfc+i,x)V'(x)-V(Xfe+i)j 

A;=0 ^ xeAk+i ^ 

and In{f,tp) = In{f) + Jn{ip)- The Waste Recycling estimator of (vr, /) studied by Prenkel 
in [6] is given by T^^{f ) = Inif, /)■ Notice that the bias of the estimator In{f, V') does not 
depend on ip (i.e. E[2'„(/, ■i/^)] = E[I„(/)]). It turns out that Propositions 12.11 and 12.21 remain 
true in this multi-proposal framework (see Theorem 13. 4p as soon as P is irreducible. Notice 
that the irreducibility of P holds if and only if for all x' 7^ y E -E, there exist m > 1, distinct 
xq = y, xi,X2, . . . , Xm = x' £ E and Ai,Ai^ . . . , Am G 'PiE) such that for all A; G {1, ... , m}, 
Xk-i,Xk G Ak and 

m 

(13) Q{xk-i, Ak)K{xk-i, Ak,Xk) > 0. 

k=l 

Proposition 2.7. Assume that P is irreducible. For any real functions and f defined on 
E, we have: 

• The estimator Inif , i^) i'^^f) ^-^ consistent: a.s. lim X„(/, -0) = {iT,f). 

• The estimator In{f , ip) 0/ (vr, /) is asymptotically normal: 

V^(J„(/,V)-(vr,/)) ^ MiO,aif,ijf) 
where the asymptotic variance (still denoted by) o"(/, -0)^ is given by 
aU.^f = <f?+ ^(^) ^) [Var..,A - F) - Var,^,^ [F)] , 

with Var«;^^ (5) = ^ k(x, A, y)5(y)^ - ^ k(x, ^, y)c/(y) 
yeA \j/eA 

• Moreover, for fixed f , the asymptotic variance (t(/, -0)^ is minimal for ip = F where F 
solves the Poisson equation In particular, this choice achieves variance reduction: 
a{f,Ff<a{ff. 

We now give two examples of non-trivial selection probability k which satisfies condition 
(1101) . The first one, k*^, defined by 
(14) 

'k{x)Q{x,A) 



max(7r(x)Q(x,A),7r(x)Q(x,yl))+E.eA\{x,f}7r(^)Q(^,^) ^' 

^1 - X]z6A\{a;} '**'^(^'^'^) if X = X, 
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generalizes the Metropolis selection given by ^ with 7(n) = min(l, u). (Notice that for x ^ x 

one has k^^ (x, A,x) < — ) which implies that 1 — ^^^(^^i ^) -2) 

EzeA\{x}^iz)Q{z,A) \^ > 

is indeed non- negative.) The second one, , which does not depend on the initial point x, 
and is defined by 

(15) K^{x,A,x) = K''iA,x)- 



generalizes the Boltzmann (or Barker) selection given by ([2]) with 'j{u) = . Notice that 

1 + u 

for both choices, the irreducibility condition (jl3p can be expressed only in terms of Q : 

m 

Yl Q{xk-i,Ak)Q{xk,Ak) > 0. 
fc=i 

For the selection probability (|15|) . we prove in section H] (see Proposition I4.1|) that the 
Waste Recycling improves the Metropolis-Hasting algorithm : 

Proposition 2.8. When k = is given by (jlSp (Boltzmann or Barker case), then we have 
a{f, /)^ < a{f)'^ . Furthermore, for f non constant, the function b 1— > cr(f, bf )"^ is minimal at 
bi, defined by ^ and b^, > 1 when a{f, /)^ > 0. 

Since for x 7^ x G A, {x, A,x) > k^{A,x), according to [lOj, the asymptotic variance 
(j(/)^ remains smaller in the Metropolis case than in the Boltzmann one. Nethertheless, it is 
likely that the difference decreases when the cardinality of the proposal sets increases. Notice 
that the optimal value b^, can be estimated by 6„ which is computed using the proposals: see 
Remark 12.41 The control variate bn-Jnif) improves therefore the WR algorithm. 

3. Main result for general multi-proposal WR 

Let {E,Te) be a measurable space s.t. {x} S J^e for all x £ E, and vr be a probability 
measure on E. Notice that E is not assumed to be finite. Let V = {A C E; Card (A) < 00} 
be the set of finite subsets of E. Let E = U„>i£'" and J-'g; the smallest cr-field on E which 
contains Ai x ■ ■ ■ x An for all Ai G J^e and n > 1. We consider the function F defined on 
E taking value on V such that F((xi, . . . , Xn)) is the set {xi, . . . , x„} of distinct elements in 
(xi, . . . ,x„). We define T-p, a cr-field on V, as the image of Te by the application F. We 
consider a measurable proposition probability kernel Q : E x J^-p [0, 1] s.t. 

(16) / Qix,dA) = l and / Q{x , dA) 1 {^^a} = 
Jv Jv 

(this is the analogue of ([9])) and a measurable selection probability kernel k : E x V x Te 
[0,1] s.t. for X G A we have k(x,A, A) = 1. Let by be the Dirac mass at point y. In 
particular, since A is finite, with a slight abuse of notation, we shall also write k(x. A, dy) = 
EzeA «^(^' ^> ^)^z (dy) and so Y.y(^A '^(^^ Av) = '^- 
We assume that the analogue of (fTO|) holds, that is 

(17) 7r(dx)Q(x, dA)K{x, A, dy) = 7r{dy)Q{y, dA)K{y, A, dx). 

Example 3.1. We give the analogue of the Metropolis and Boltzmann selection kernel defined 
in (|14p and (jlSp when E is finite. We consider N(dx,dA) = 7T(dx)Q{x,dA) and a measure 
NQ(dA) on J^p such that J^^^ N{dx,dA) is absolutely continuous w.r.t. NQ(dA). Since x G A 
and A is finite N(dx,dA)-a.s., the decomposition of N w.r.t. A'^o gives that N(dx,dA) = 
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NQ{dA)rji{dx), where rji{dx) = Yliy^A'''A{y)^y{dx) if A is finite and r^idx) = otherwise, 
and {x^A) ^ r^(x) is jointly measurable. 

The Metropolis selection kernel is given by: for x, y S A, r^i 7^ 0, 



(18) ^^''\x,A,y) 



T.z&A\{x,y} fA{z) + max(rA(x), rA(?/)) ' 



if X 7^ y and k^(x, A,x) = l- Es,eA\{x} i^^'{x, A, y). 

The Boltzmann selection kernel is given by: for x, y E A, 7^ 0, 



(19) K''{x,A,y) = K''{A,y) 



We choose those two selection kernels to be equal to the uniform distribution on A when 
rA = 0. For those two selection kernels, equation (fTTj) is satisfied. A 



Example 3.2. Let us give a natural example. Let be a reference measure on E with no atoms, 
vr a probability measure on E with density w.r.t. v which we still denote by vr, a selection 
procedure given by Q(x, A) = Wx{{x, Yi, . . . , Yn} C A) for A G J^-p, where Yi, . . . , are E- 
valued independent random variables with density w.r.t. v given by q{x, •) under P^; and n > 1 
is fixed. We use notations of Example 13. 1[ In this setting, we choose NQ{dA) = 'W^i^/^v{dx) 

and the function ta is given by: for x £ A, rA{x) = 7r(x) q{x, z). A 

z<^A\{x} 

The Markov chain X = {Xn,n > 0) is defined inductively in the following way. Let Xq be 
a random variable taking values in E with probability distribution i/q. At step n, Xq, . . . , Xn 
are given. The proposal set at step n + 1, An+i, is distributed according to Q(X„, •). Then 
Xn+i is chosen distributed according to K{Xn, An+i, .). This is a particular case of the hit 
and run algorithm [Ij, where the proposal sets are always finite. It is easy to check that X is 
a Markov chain with transition kernel 

(20) P{x,dy)= I Q{x,dA)K{x,A,dy). 

Jv 

For / a real valued measurable function defined on E, we shall write Pf{x) for P{x, dy)f{y) 
when this integral is well defined. 

Condition (fT7|) ensures that X is reversible w.r.t. vr : 7r(dx)P(x, dy) = ■K{dy)P{y,dx). We 
also assume that X is Harris recurrent (see [8] section 9). This is equivalent to assume that 
for all B e J^E s.t. tt{B) > we have P(Card {n > 0;Xn £ B} = oo\Xo = x) = 1 for ah 
X G E. 

Example 3.3. It is easy to check in Example 13.21 that X is Harris recurrent if the random 
walk with transition kernel q is itself Harris recurrent and 

Vx G E, Q{x,dA) a.e. , Vy G A, K{x,A,y) > 0. 

A 

For / a real valued measurable function defined on E and a measure on E, we shall write 
{ly, f) for J L'{dy)f{y) when this integral is well defined. 

Let / be a real-valued measurable function defined on E s.t. (vr, |/|) < 00. Theorem 17.3.2 
in [8] asserts that a.s. lim„^oo -^n(/) = (t'")/)) with /n(/) defined by 
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We consider the functional Jn defined by 

fc=o ^ ' 

1 / ^ 

fc=0 ^ iGAfc+i ^ 

for /3 any real- valued measurable function defined on E xV x E. We set In{f,P) = In{f) + 
Jn{l^)- To prove the convergence and the asymptotic normality of the estimator In{f,P) of 
(tt, /), we shall use a martingale approach. In particular, we shall assume there exists F a 
solution to the Poisson equation F — PF = f — {it, f) s.t. (vr, F^) < cxd (see theorem 17.4.2 
and condition (V.3) p. 341 in [8] to ensure the existence of such a solution). 

We introduce the following convenient notation. For a probability measure v on E and 
real valued functions h and g defined on E, we write, when well defined, 

CoVu{h,g) = {v^gh) — {u,g){v,h) and \aip{h) = {u,h^) — {i^,h)'^ 

respectively the covariance of g and h and the variance of h w.r.t. v. We also write Kx^Aidy) 
for the probability measure K{x,A,dy) and the Px,Ai') for the function P(x,A,-). 

Theorem 3.4. We assume X is Harris recurrent, (vr, /^) < oo, there exists a solution F to 
the Poisson equation F — PF = f — {it, f) such that (vr, F^) < oo, and (5 is square integrable: 
f TT{dx)Q{x,dA)K{x, A,dy)P{x, A,y)'^ < oo. Under those assumptions, we have: 

(i) The estimator In{f , (3) of{TT,f) is consistent: a.s. lim In{f,P) = (vr,/). 

n— >oo 

(ii) The estimator Xn{f ■, (3) o/(7r,/) is asymptotically normal: 



V^{lM,(3)-{n,f)) ^ AA(0,a(/,/3)2), 



and the asymptotic variance is given by 
(22) a(/,/3)2 =a(/)2 + J T:{dx)Q{x,dA) [Var,^ ,,(/3,,a - F) - Var«^_^(F)] , 

withaiff = {-K,F^ - {PF)^). 
(Hi) The asymptotic variance cr{f, /3)^ is minimal for Px,A = F and 



(23) a{f,Ff = J Tr{dx)(^J Q{x,dA){Kx,A, Ff - (^j Q{x,dA){Kx,A, F)^ 



1 < -(/)^. 



Proof. We shall prove the Theorem when Xq is distributed according to tt. The general case 
follows from proposition 17.1.6 in [8j, since X is Harris recurrent. 
We set, for n > 1, 

AM„ = F{Xn) - PF{Xn^i)+r]{Xn-l,An,Xn), 

where 

r]{x, A, y) = ^ (k(x, a, x) - l{y=x}) /3{x, A, x). 

Notice that AM„ is square integrable and that E[AM„+i|^„] = 0, where Qn is the fi-field 
generated by Xq and {Ai,Xi) for 1 < i < n. In particular M = {Mn,n > 0) with M„ = 
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^^^-^ AMfc is a martingale w.r.t. to the filtration {Qn,n > 0). Using that F solves the 
Poisson equation, we also have 

(24) J„(/,/3) = 1 M„ - - PF{Xn) + - PF{Xo) + {ir, /). 

n n n 

As {tt,F'^) < cxD implies that (vr, |P-F|) < oo, we deduce from theorem 17.3.3 in [8] that a.s. 

lim„^oo ^ PF{Xn) = 0. In particular part (i) of the Theorem will be proved as soon as we 

check that a.s. lim„^oo ^ Mn = 0. 

We easily compute the bracket of M„: 

n n 

(M)„ = ^E[AMi\gk-i] = ^h{Xk-i), 
k=l k=l 

with 

hix) = P{F^){x) - {PF{x)f + j Q{x,dA) [-2Cov,(,,^,.)(/3,,A,F) + Var,(,,^,.)(/3,,A)] • 
Elementary computation yields 

-2Cov«^(^^^^.) F) + Var«(^^^^.) = Var«(^^^^.) - F) - Var^(^^^^.) (F) . 

Since {tt,F'^) < oo and / TT{dx)Q{x,dA)K{x, A,dy)P{x, A,y)^ < oo, we have that /i is vr 
integrable. We set a{f,l3)'^ = {Tr,h), that is a{f,P)'^ is given by ([2^ . thanks to ^ and the 
fact that TT is invariant for P. Theorem 17.3.2 in asserts that a.s. lim„^oo ^(-^)n = (tt, h). 
Then theorem 1.3.15 in [5] implies that a.s. limn^oo - Mn = 0. This ends the proof of part 
(z). 

The proof of part (ii) relies on the central limit theorem for martingales, see theorem 2.1.9 
in [5]. We have already proved that a.s. lim„^oo ^{M)n = a{f,f3)'^. Let us now check the 
Lindeberg's condition. Notice that theorem 17.3.2 in [8j implies that for any a > 0, we have 



hm -Y,n^Mil{\^Ml>a}\Ok-i] = {7r,K) 



n— >oo n 

k=l 



where ha{x) = E[AM^l||^j^^2^^||Xo = x\. Notice that < ha < h and that {ha, a > 0) 
decreases to as a goes to infinity. We deduce that a.s. 

1 " 

limsup-^E[AM|l||^^2>^||^fc-i] < limsup(7r, /i^) = 0. 

n— >oo ri a— >oo 

This gives the Lindeberg's condition. We deduce then that {-^Mn,n > 1) converges in 
distribution to Af{0, a{f , Pf). Then use <^ and that a.s. lim -{PF{Xn+i))'^ = (thanks 

n^oo n 

to theorem 17.3.3 in [8j) to get part (ii). 

Proof of part (in). The asymptotic variance a{f, (j)"^ is minimal when Var^^ a{I^x,a—F) = 
that is at least for (jx,A = F- Of course, a{f,F)'^ < a{f,0)'^ = a{f)'^. Using that tt is 
invariant for P and the definition (120p of P, we get 

a(/)2 = (vr,PF2)-(vr,(PF)2) 

= j Ti{dx)Q{x,dA){K^^A,F^)- j T^{dx)(^j Q{x,dA){K^^A,F)^ . 
And the expression of a{f,F)'^ follows from (|22p . 

□ 



WRMC 



13 



4. The Boltzmann case 

We work in the general setting of Section [3] with the Boltzmann selection kernel k given 
by (jl9p (or simply (jlSp when E is finite). The next Proposition generalizes Proposition 12.81 
It ensures that the asymptotic variance of the waste recycling algorithm a{f, /)^ is smaller 
than the one cr(f)'^ of the standard Metropolis Hastings algorithm and that b ^ fT(/, bf)'^ is 
minimal at b^, given by ([7]). In the same time, we show that this variance cr^f)'^ is at least 
divided by two for the optimal choice (3{x, A, y) = F{y) in our control variate approach. 

For / s.t. (vr, p) < oo, we set fQ = f- (vr, /) and 

(25) ^(f) = l J <dx)P{x,dy)ifoix) + foiy)f = {7r,fo{fo + Pfo)). 

Notice that the second equality in (|25p is a consequence of the invariance of vr w.r.t. P. 

Proposition 4.1. We assume that X is Harris recurrent, (vr,/^) < oo, there exists a solution 
F to the Poisson equation F — PF = f — (vr, /) such that (vr,^^) < oo. We consider the 
Boltzmann case: the selection kernel k is given by il9\) . For (3{x, A, y) respectively equal to 
F{y) and f{y), one has 



a{f, FY = - [a{fY - Y^iM)) and a{f, fY = a{fY - A(/). 

The non-negative term A(/) is positive when Varjr(/) > 0. 

Furthermore, ifYar.^{f) > 0, then (vr, - fPf) = ^E^ [{f{Xo) - f{Xi)Y] is positive, 
the function b i— > cj(/, bfY is minimal at 

(26) b - ^-^f')-(-^f)' 
and 6^ > 1 when a{f, fY > 0. 

Proof. Recall notations from Example 13. 1[ We set K^{dy) = K^{A,dy). For g and h real 
valued functions defined on E, we have 

(27) / TT{dx)Qix,dA) {Kf,g){K^,h) = [ No{dA)rAidx) {Kf,g){Kf,h) 

NoidA) {rA,g){K^,h) 



'K{dx)Q{x,dA) g{x){KA,h) 
= {7T,gPh), 

where we used ()19p for the second equality. Using this equality with h = g = F in the first 
term of the expression of a{f,FY given in (j23p . we obtain 

a{f,FY = (vr,FPF - (PFY) = ^(vr,^^ - (PFY - {F - PFY) = ^(a(/)2 - VarJ/)), 



where we used the Poisson equation Q for the last equality. 
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We also get that 

j TT{dx) Q{x, dA) Var^fl {bf - F) - Var^B (F) 

= J Adx)Q[x,dA)[{K%{hf - Ff) - {nl^hff + 2{K%hf){K%F) 

-{nlFf-{nlF') + {n^^,Ff] 

= {7T, b'f - 2b fF - b^Pf + 2fPF) 

= b^TT, f - fPf) - 2b ((vr, f) - {tt, ff) , 

where we used ([27l) for the second equation and ([6]) for the last equality. We deduce from 
(|22i) with f3x^A = bf that 

^(/, bff - a{ff = b^TT, f - fPf) - 2b ((vr, f) - {tt, ff) . 

We first check that Var7r(/) > implies that (tt, — fPf) > 0. If, when Xq is distributed 
according to vr, a.s. f{Xi) = /(Xq), then a.s. k ^ f{Xk) is constant and by the ergodic 
theorem this constant is equal to (vr, /). Therefore Var7r(/) > implies positivity of (vr, — 
fPf) which is equal to ^E^ [(/(^o) - f{Xi)f] by reversibility of vr w.r.t. P. 

Hence when Var^(/) > 0, then b ^ (7{f, bf)'^ is minimal for b = b^, defined by ()26p . 

For the choice 6 = 1, one obtains 

(28) - aif, ff + aiff = (vr, /(/ + Pf)) - 2(vr, ff = A(/) = Var^(/) + (vr, /qP/o). 

By dlZD, (vr,/oP/o) = / iT{dx)Q{x,dA){Kfjo)^ > and A(/) is positive when Var^(/) > 0. 

Moreover the difference (vr, fPf) — {it, /)^ = (vr, /o-P/o) is non-negative thanks to (j27p and 
when it is equal to 0, then (p7|) implies that (vr, f^Pg) = {i:^gP f^) = for each function g on 
E such that (vr,g^) < +oo. In this case, by (p8|) . 

= (vr, (F + PF){F - PF)) - Var^(/) 
= (vr,(/o + 2PF)/o)-Var^(/) = 0. 
Hence when Var7r(/) > and (t(/, /)^ > then, we have (vr, /qP/o) > and 6^ > 1. □ 

5. Further results in the single-proposal case 

The Metropolis-Hastings algorithm corresponds to the single proposal case that is the 
particular case of the multi-proposal algorithm of Section [3] where Q(x, .) gives full weight to 
the set of subsets of E (not assumed to be finite) containing x and at most one other element 
of E. The acceptance probability is then given by p{x,y) = K{x,{x,y},y) and the selection 
kernel Q{x, .) is the image of Q{x, .) by any measurable mapping such that the image of {x, y} 
is y. See Remark (j2.6p in the particular case of E finite. Equation (jlTh is then equivalent to 
the following generalization of ([T]) 

(29) 7r{dx)Q{x, dy)p{x, y) = 7r{dy)Q{y, dx)p{y, x). 
Moreover the transition kernel of the Markov chain X is given by 

(30) l{y^^yP{x,dy) = l{y-^^yp{x,y)Q{x,dy) and P{x,{x}) = 1 - / p{x, z)Q{x,dz). 

Motivated by the study of the WR algorithm which corresponds to ^ = / and of the optimal 
choice ijj = F, we are first going to derive more convenient expressions of (t(/, -0)^ in the single 



WRMC 



15 



proposal framework. We then use this new expression to construct a counter-example such 
that a{f, /)^ > (T{f)'^ . And, when p{x, y) + p{y, x) is constant on = E"^ \ {{x , x) : x G E}, 
using again the expression of a{f^'4)y', we compute the value of h such that cr{f,bf)'^ is 
minimal and check that (t(/, /)^ < cr{f)'^ as soon as / is non constant. 

5.1. Another expression of the asympotic variance. We recall that in the notation E^r, 
the subscript vr means that Xq is distributed according to vr. 

Lemma 5.1. We assume that (vr,/^) < cxd and there exists a solution F to the Poisson 
equation ([6]) such that {tt,F'^) < +oo. Let ip he square integrable: {'k,iIP') < oo. In the single 
proposal case, we have 



l-p(Xo,Xi)) - F(Xo)) 



(l - p(Xo, Xi)) (i^iXi) - F{Xi) - V'(Xo) + F{Xo) 



Proof. In the single proposal case, K{x,{x,y},y) = 1 — K{x,{x,y},x) = p{x,y) for x 7^ y. 
Therefore, for a real valued function g defined on E, we have 

(31) Var«.(3.^{^_j^} = p{x, y)(l - p{x, y)){g{y) - g{x)f . 

Thus we deduce that 

/ .(..)Q(...,)V...,,„,„,to) = / V(.. „,1 ,))(„,) 

7r{dx)P{x, dy){l - p{x, y)){g{y) - g{x)f 
l-/5(Xo,Xi))(5(Xi)-5(Xo)'' 



where we used (j30p for the second equality. Plugging this formula with g = tp — F and g = F 
in (|22p gives the result. □ 



Taking ip = F and tp = f in the previous Lemma gives the following Corollary. 

Corollary 5.2. We assume that (vr,/^) < 00 and there exists a solution F to the Poisson 
equation ^ such that {tt,F'^) < +00. In the single proposal case, we have: 

aif, Ff - a{ff = -E^ [(1 - p(Xo, - F{Xo)f] , 

f? - <yU? = -E. [(1 - p(Xo,Xi)) - F{X^)f - (PF{X,) - PF{Xo)f]] . 

5.2. A counter-example. We are going to construct a counter-example such that (t(/, /)^ > 
cri^f)"^ in the Metropolis case, thus proving the statements concerning this case in Proposition 
2.31 This counter-example is also such that the optimal choice tp = F does not achieve 
variance reduction : a{f,F)'^ = cr{f)'^. Let P be an irreducible transition matrix on E = 
{a,6, c}, with invariant probability measure vr s.t. P is reversible w.r.t. vr, 

P(a, b) > 0, P(a, a) > and P(a, c) / P{b, c). 
Let / be defined by f{x) = l{x=c} ~ P{x-, c) for x £ E. We have 

(vr,/) =7r(c)-^7r(x)P(x,c) =0. 

The function F{x) = l{x=c} solves the Poisson equation ([6]): F — PF = / — (vr, /). 
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Let PG ( Pia!l)+?(a,b) ' 1 



Qix,y) 



We set 



f^if {x,y) = {a,b), 
Pia, a) - P{a, - 1) if {x, y) = (a, a), 
^P{x,y) otherwise. 



We choose 



p{x,y) 



P if {x,y) = {a,b), 
1 otherwise. 



Since p{a,b)7r{a)Q{a,b) = pTT{a)P{a,b)/ p, we have p{x,y)TT{x)Q{x,y) = ■7T{x)P{x,y) for all 
X ^ y ^ E. Equation ([T|) follows from the reversibility of vr for P. Notice also that ([2]) holds 
with 7(m) = min(l,u). 

By construction, the matrix P satisfies ([3]). By Corollarv l5.2t we have a{f, F)^ — a{f)'^ = 
and 



(32) 



^(/, ff - o{ff = n{a)P{a, - p){P{b, c) - P{a, c)f > 0. 



Let us illustrate these results by simulation for the following specific choice 

38 21 1 

vr = — |^|,_r=— |42 18|,p 

6 54 



4 1 

— and Q = 

10 ^ 120 



13 


105 


2 


84 





36 


12 


108 






Then aiff 
Using N - 



a{f,ff = -0.010115 amounts to 14% of a{f)'^ ~ 0.0728333. 

10 000 simulations, we give estimations of the variances cr^ of Inif), (^wRn of 



In{f, f) and of the difference o"„ — c^^j^^ with asymptotic confidence intervals at level 
The initial variable Xq is generated according to the reversible probability measure tt. 



n 






^WR,n 


1 


[0.1213 , 0.1339] 


[0.1116 , 0.1241] 


[0.0091 , 0.0104] 


2 


[0.0728 , 0.0779] 


[0.0758 , 0.0815] 


[-0.0041 , -0.0025] 


5 


[0.0733 , 0.0791] 


[0.0798 , 0.0859] 


[-0.0075 , -0.0058] 


10 


[0.0718 , 0.0772] 


[0.0800 , 0.0859] 


[-0.0094 , -0.0074] 


100 


[0.0702 , 0.0751] 


[0.0803 , 0.0858] 


[-0.0114 , -0.0092] 


1000 


[0.0719 , 0.0769] 


[0.0811 , 0.0867] 


-0.0105 , -0.0083] 



5.3. Case of a constant sum p{x,y) + p{y,x). Under Boltzmann selection rule, according 
to Proposition 14.11 the asymptotic variance a{f, /)^ of I^^{f) = In{f, f) is smaller than the 
one (T{fY of In{f) and a{f,bf) is minimal for b = b^, given by (pBj) . In the single proposal 
case, Boltzmann selection rule ensures that p{x, y)+p{y, x) = 1 on E'^ = E'^\{{x,x) : x G E}. 
It turns out that we are still able to prove the same results as soon as p{x, y) + p{y, x) is 
constant on E^. Notice that Var7r(/) > and that the trivial case Var7r(/) = corresponds 
to / constant vr-a.s.. 

Proposition 5.3. We assume (vr, /^) < cxd, Var7r(/) > 0, there exists a solution F to the 
Poisson equation ([6]) such that (vr, F'^) < oo. We consider the single proposal case and assume 
that there exists a G (0, 2) such that 

(33) Tr{dx)Q{x,dy) a.e. on E^, p{x,y) + p{y,x) = a. 

Then we have: 

i) (7r,/2 - fPf) = lE^ [(/(Xo) - /(Xi))2] is positive. 
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(34) a(/,V')-a(/)2 = -(l-a/2)E, 



+ (1 - a/2)E^ 



V^(Xi)-F(Xi)-^(Xo) + F(Xo) 



for any real valued function on E such that (vr, -0^) < oo. 

iii) The function b cr{f, bf)"^ is minimal at 6^ given by I126\) and 6^ > 1/a. 

iv) a{f, /)2 - = -(2 - a)A(/) < 0, where A(/) is ffiwen by 

Proof. Statement i) follows from the proof of Proposition 14.11 

For statement ii), notice that by reversibility of tt, we deduce from Lemma |5. II that 



aif,i;)-a{ff 



1-/5(Xi,Xo))(f(Xi)-F(Xo))' 



1 - p{Xi,Xo) - - V(Xo) + F{Xo: 



This and Lemma |5 . 1 1 imply (j34p . 

For iii), using (I34p with ^/^ = 6/, it is straightforward to get that fT(/, 6/)^ is minimal when 
b equals 



[(/(Xi) - /(Xo))(F(Xi) - FiXo))] _ (tt, /(F - PF)) _ (vr, /2) - (vr, /)2 



E. [(/(Xi)-/(Xo))2] 



{TTj^-fPf) (TTj^-fPf) 



, 2\ and using Lemma 15.41 below. 



Remarking that b^, = -, — 75 — , , — ^ — 75-, , , „, 

one deduce that 6^ > 1/a. 

We now prove iv). Recall that fo = f — (vr, /). Since (vr, /o(/o + Pfo)) = (2 — a)Var7r(/) + 
(tt, /qP/o + (a — l)/o)i we deduce from Lemma [53] that A(/) given by ([25]) is positive. We 
have 



1 



E, [(/(Xi) - F(Xi) - /(Xo) + F{Xo)f - (F(Xi) - F{Xo)f] 

= i E^ [(/o(Xi) - /o(Xo))'] - E. [(/o(Xi) - /o(Xo))(F(Xi) - F{Xo))] 

= (vr, /o' - foPfo) - 2(vr, /o(F - PF)) 
= -(^,/o(/o + ^'/o)), 



where we used that vr is invariant for P and that P is reversible with respect to vr for the 
second equality and that F solves ([6]) for the last equality. We conclude using ([M]) with 
ij = f. □ 

Lemma 5.4. Let h be a real valued function defined on E such that (7r,/i^) < 00. Under 
hypothesis fg3]] . we have (vr, hPh + (a — l)/i^) > 0. 
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Proof. Using ([30]) then ([33|) . we obtain 

{it, hPh + {a — l)h'^) = / TT{dx)Q{x,dy)p{x,y)h{x)h{y) 

Je2 



+ J TT{dx)\^a-J ly^xQ{x,dy)p{x,y) j h (x) 
/ TT{dx)Q{x, dy) [p{x, y)h{x)h{y) + p{y, x)h^{x)] 
+ a TT{dx)Q{x,{x})h^ (x). 



E 



To conclude, it is enough to check that the first term in the r.h.s. is nonnegative. Using ([33 
and ([29|) for the first equahty, we get 

Q / TT{dx)Q{x,dy) [p{x,y)h{x)h{y) + p{y,x)h'^{x)] 



Ei 



/ T:{dx)Q{x, dy)p{y, x) [p{x, y)h{x)h{y) + x)h^{x)\ 
Je2 

+ / ■^idy)Q{y,dx)p{y,x)[p{x,y)h{x)h{y) + p{y,x)h'^{x)] 

J El 

■K{dx)Q{x, dy) [p{y, x)h{x) + p{x, y)h{y)f 



> 0. 



6. Other remarks 
We work in the general setting of Section [3l 



□ 



6.1. About the estimator In{f + Pip — Motivated by Remark 12.51 on the study of 
In{f + -PV' ~ V'); we compute the asymptotic variance a{f, /?)^ of 



In{f) + -Yl ([ QiXk,dA)K{Xk,A,dS:)f3{Xk,A,x) - (3{Xk,Ak+i,Xk+ 

1 ""^ / 

= /„(/) + -Y,(nP{Xk,Ak+i,Xk+i)\Xk] - P{Xk,Ak+i,Xk+i) 

Following the proof of Theorem 13.41 one obtains that the above estimator of (vr, /) is 
under the hypotheses of Theorem 13.41 convergent and asymptotically normal with asymptotic 
variance 

Hf,Pf = + j <dx) [VarQ(,,.)(K/3, - kF,.) - VarQ(,,.)(AtF,)] , 

where VarQ(^,.)((^) = j Q{x,dA)^{Af - (^j Q{x,dA)ip{A)^ , k/3^{A) = {K^,A,f3x,A) and 
kF^{A) = {Ka,,A,F). 

Notice that the sign of a{f,f3)'^ — a{f,(3)'^ depends on /3 (take Px,A = F and (3x,A = —F)- 



WRMC 
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6.2. Changing the selection kernel in J^. Let / k be such that (fTTl) (or simply ([TO]) 
if E is finite) still holds when n is replaced by k' and Jn{i^) and Jn{l3) be defined like Jn{'^) 
and J'n{(3) with the chain X unchanged but with ^fc+i, rr) replaced by A^+i, x) 

in (fT2]) and (pT]) . Thus, we have 



^ n— 1 



k=0 xeA 



fc+1 



Note that in general EseAfc+i -4^+1, x)V'(x) / E[V'(Xfc+i)|Xfc, ^^+1]. 

In the single proposal case, Frenkel [7J suggests that Jn{f) can also be used as a control 
variate. In general, for a real valued function (3 defined on E x V x E, the almost sure 
limit of Jn{l3) is different from zero, which means the estimator /„(/) + JlXf^) of (vr, /) 
is not convergent. However, when (5{x,A^-) = tp{-), Lemma 16.11 below ensures that the 
estimator Inif) + Jn{'^) of (vr, /) is convergent. It is also easy to prove that this estimator is 
asymptotically normal and compute the asymptotic variance, but we have not been able to 
compare it with the asymptotic variance of In{f)- 

Lemma 6.1. We assume X is Harris recurrent, (vr, /^) < 00, there exists a solution F 
to the Poisson equation F — PF = / — (vr, /) such that (vr, F'^) < 00, and ip is such that: 
{'K,ilP') < 00. Under those assumptions, the estimator In{f) + J'ni'ip) of {ir^f) is consistent: 
a.s. hm /n(/)+J'n(V') = (vr,/>. 

n— »oo 

Proof. We set 

ARn = j K{Xn-l,An,dx)^{x)- j Q(X„_i , d^)^' , ^, d5 ) V'(x) . 

Notice that Ai?„ is square integrable and that E[Ai?„+i|^„] = 0, where Qn is the cr-field 
generated by Xq and {Ai,Xi) for 1 < i < n. In particular R = {Rn,Ti > 0) with Rn = 
Yl^=i is a martingale w.r.t. to the filtration {Qn,n > 0). Notice that 

XiiP) = -Rn+In{l)-- [ Q{Xn,dA)K'{Xn,A,dx)^{x) + - [ QiXo, dA)^' [Xq, A, dS:)^P{x), 
n n J n J 

where 7(x) = j Q{x,dA)K {x, A,dx)il){x) — ■i/'(x). Following the proof of Theorem 13.41 we 

easily get that a.s. lim — i?„ = and that a.s. 

n— »oo n 

lim Jl^ii}) = lim I„(7) = (7r,7). 

Using (fT7|) satisfied by n' instead of k, we get that (vr,7) = 0. This ends the proof of the 
Lemma. □ 
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